Readme (created on 08/04/2024; updated on 5/13/2025)

For this study, all the data processing is done using Python, while data analysis and visualization are done using R.

1. R code for data analysis: "Data analysis.R", which also includes an explanation of the datasets listed below in section 2 and all the variables/columns in them.

2. Datasets for data analysis (input to "Data analysis.R"):
(a) "ThreeEmotionsCH_sentiment scores.csv": The trios of ChatGPT-output emotion words for all lyrics are in column “threeEmotionsCH”, and the parsed emotion words are in column “threeEmotionsCH_split”. The final sentiment score for each lyric is in column “IntensityWithSign”.
(b) "ThreeEmotionsCH_Year2_IntensityWithSign_avg.csv"
(c) "Lyric_split_sentiment scores.csv": The final sentiment score for each lyric is in column “IntensityWithSign”.
(d) "Lyrics_split_Year2_IntensityWithSign.csv"
(e) "Taiwan Misery.xlsx": contain the historical data of Taiwan's annual GDP growth rate, unemployment rate, inflation rate (CPI), and Economic Misery Index (the sum of unemployment rate and CPI). 
(f) "manual check.xlsx": Manually check the predictive accuracy of the ChatGPT-assisted and lexicon-only approaches with 200 random lyrics. Column "IntensityWithSign1": total sentiment score via the ChatGPT-assisted approach. Column "IntensityWithSign2": total sentiment score via the lexicon-only approach. Column "Manual_Check": manually check if a lyric is positive ("P") or negative ("N").

3. Python code for data processing:
(a) "GPT.py": communicates with OpenAI's ChatGPT API to feed lyrics and receives three output emotion words as return.
(b) "split3emos2emoWords.py": splits the string of three ChatGPT-output emotion words into analyzable, emotion-centered units/groups of words.
(c) "calc3emotionsChEmos_synonym_expanded.py": calculates the sentiment score for each lyric based on the three ChatGPT-output emotion words split by 3(b), used in conjunction with the lexicon-based method.
(d) "splitChLyrics.py": splits each lyric into analyzable, emotion-centered units/groups of words.
(e) "calcChLyricsEmos.py": calculates the sentiment score for each lyric using only the lexicon-based method. The input is from 3(d).

4. Lexicons (all being csv files) used for calculating lyric sentiments (input to 3(c) and 3(e)):
(a) "negation words.csv": contains common negation words in Chinese.
(b) "degree words.csv": contains common degree adverbs in Chinese.
(c) "DLUT_reshaped_final.csv": the reshaped original CSVOL emotion lexicon used in this study. DLUT stands for Dalian University of Technology, the institution that developed the CSVOL lexicon.
(d) "Anti-DLUT_reshaped_final.csv": contains the same entries as in 4(c), but the emotion data are transposed to the antonyms with emotional intensities all set to 1. It is used if there are an odd number of negation words in a group of words.
(e) "Anti-Anti-DLUT_reshaped_final.csv": contains the same entries as in 4(c), but the emotional intensities are all set to 1. It is used if there are an even number of negation words in a group of words.
(f) "DLUT_reshaped_final_Synonym_expanded": expanded version of 4(c) with synonyms incorporated.
(g) "Anti-DLUT_reshaped_final_Synonym_Expanded": expanded version of 4(d) with synonyms incorporated.
(h) "Anti-Anti-DLUT_reshaped_final_Synonym_Expanded": expanded version of 4(e) with synonyms incorporated.

5. Logic flowchart for 3(e), as well as 3(c): "Logic flow chart for calculating emotion score for each emotion group of words.pdf"

6. Notes: 
(a) Due to copyright restrictions, this repository does not include the corpus of lyrics processed and analyzed in this study. Still, the datasets contain meta-data for those lyrics, including song titles, release years, singer names, singer types (male, female, or group), singer regions, lyricists, and composers.
(b) However, partial replication of results is still possible. In principle, the column "threeEmotionsCH" in 2(a), which contains the three ChatGPT-output emotion words, can be used as the input for 3(b), the output of which can in turn be fed into 3(c), so that the sentiment score based on three ChatGPT-output emotion words can be recreated for every lyric. Yet, for this to work, some changes need to be made to the Python files 3(b) and 3(c). First, the path and input/output file names need to be changed according to your own setup. Second, due to resource limits of most personal computers, 2(a) needs to be evenly split into 29 smaller csv files (10,000 rows contained in each, except for the 29th one with fewer rows) with file name suffix being "1", "2", ..., "29", before being serially fed into 3(b) and 3(c), both of which employ multi-processing for efficiency. Third, the "num_processes=" parameter in the "parallel_apply()" function in both 3(b) and 3(c) needs to be adjusted to the number of virtual cores in your computer's CPU's for multi-processing. Lastly, the processed 29 csv files need to be recombined into a single one.

